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Abstract 

Sequential  Monte  Carlo  methods,  also  known  as  particle  methods,  are  a  widely  used  set 
of  computational  tools  for  inference  in  non-linear  non-Gaussian  state-space  models.  In  many 
applications  it  may  be  necessary  to  compute  the  sensitivity,  or  derivative,  of  the  optimal  filter 
with  respect  to  the  static  parameters  of  the  state-space  model;  for  instance,  in  order  to  obtain 
maximum  likelihood  model  parameters  of  interest,  or  to  compute  the  optimal  controller  in  an 
optimal  control  problem.  In  IPoviadiis  et  al.l  [201  ll]  an  original  particle  algorithm  to  compute 
the  filter  derivative  was  proposed  and  it  was  shown  using  numerical  examples  that  the  particle 
estimate  was  numerically  stable  in  the  sense  that  it  did  not  deteriorate  over  time.  In  this  paper 
we  substantiate  this  claim  with  a  detailed  theoretical  study.  Lp  bounds  and  a  central  limit 
theorem  for  this  particle  approximation  of  the  filter  derivative  are  presented.  It  is  further  shown 
that  under  mixing  conditions  these  Lp  bounds  and  the  asymptotic  variance  characterized  by 
the  central  limit  theorem  are  uniformly  bounded  with  respect  to  the  time  index.  We  demon¬ 
strate  the  performance  predicted  by  theory  with  several  numerical  examples.  We  also  use  the 
particle  approximation  of  the  filter  derivative  to  perform  online  maximum  likelihood  parameter 
estimation  for  a  stochastic  volatility  model. 

Some  key  words:  Hidden  Markov  Models,  State-Space  Models,  Sequential  Monte  Carlo, 
Smoothing,  Filter  derivative,  Recursive  Maximum  Likelihood. 


1  Introduction 


State-space  models  are  a  very  popular  class  of  non-linear  and  non-Gaussian  time  series  models  in 
statistics,  econometrics  and  information  engineering;  see  for  example  Capoe  et  al.  2005l| ,  Doucet  et  al 


2001 1  ■  Durbin  and  Koopman  200  llj.  A  state-space  model  is  comprised  of  a  pair  of  discrete-time 
stochastic  processes,  and  {Vi}„>oi  where  the  former  is  an  V-valued  unobserved  process 

and  the  latter  is  a  V-valued  process  which  is  observed.  The  hidden  process  {Xn}n^Q  is  a  Markov 
process  with  initial  law  dxTTg  (x)  and  time  homogeneous  transition  law  dx' fg  (a;'|  x),  i.e. 


Xq  dxoTTg  (xo)  and  Xn\{Xn-l  =  Xn-l)  ^  dXnfe  {Xn\Xn-l)  ,  n>l.  (1.1) 
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It  is  assumed  that  the  observations  {yn}n>o  conditioned  upon  {-Yn}„>o  are  statistically  independent 
and  have  marginal  laws 


Yn\  ^{^fc}fc>o  “  {^fe}fc>o)  ^  dyngg  (y„|  x„)  . 


(1.2) 


Here  irg  (x),  fg  (a;|  x')  and  gg  {y\  x)  are  densities  with  respect  to  (w.r.t.)  suitable  dominating  measures 
denoted  generically  as  dx  and  dy.  For  example,  \i  X  C  Rp  and  C  R?  then  the  dominating  measures 
could  be  the  Lebesgue  measures.  The  variable  6  in  the  densities  are  the  particular  parameters  of 
the  model.  The  set  of  possible  values  for  0,  denoted  0,  is  assumed  to  be  an  open  subset  of  R^.  The 


model  inii-dni)  is  also  often  referred  to  as  a  hidden  Markov  model  in  the  literature  ICanne  et  al 
2005^. 


For  a  sequence  {-Zn}„>o  ^cid  integers  i,  j,  let  Zi.,j  denote  the  set  {zi,  Zi+i, Zj},  which  is  empty 
if  j  <  i.  Equations  CH)  and  (O  define  the  law  of  (Xo:n,  Fo:„_i)  which  is  given  by  the  measure 


n—1 


dxQirg  {xq)  dxkfg  (a;fc|  Xk-i)  dykgg  {yk\ xu) , 

fe=l  fe=0 

from  which  the  probability  density  of  the  observed  process,  or  likelihood,  is  obtained 

/n  n—  1 

dxQT^g  (xo)  dxkfg  {xk\xk-i)  gg  {yk\ Xk)  ■ 


(1.3) 


(1.4) 


fc=i 


fe=0 


For  a  realization  of  observations  Yo-.n-i  =  yo-.n-i,  let  Qg^n  denote  the  law  of  X^-n  conditioned  on  this 
sequence  of  observed  variables,  i.e. 


9,n(dxo:n)  — 


1 


pg  (yo-.n-i)  y 


/  n-1  \ 

dxoTTe  {xo)g&  {yol^co)  dxkfe  {xk\xk-i)ge  {yk\xk)  dxnfe  {xn\xn-i) 


Let  gg^n  denote  the  time  n  marginal  of  Qe,„.  This  marginal,  which  we  call  the  filter,  may  be  computed 
recursively  using  Bayes’  formula: 


ge^n+l{dXn+l)  =  Qfl,n+1  (dXn+l)  = 


dXn-\-i  J*  gg {dXji^  gg  (  y^  \  Xn  )  fg  Xn-\-l  \  Xn  ) 
!  gg,n{dx'^)ge{yn\x’^) 


n  >  0 


and  ggfi  =  irg  by  convention.  Except  for  simple  models  such  the  linear  Gaussian  state-space  model 
or  when  T”  is  a  finite  set,  it  is  impossible  to  compute  pg  (yo-.n),  Qg,n  or  gg^n  exactly.  Particle  methods 
have  been  applied  extensively  to  approximate  these  quantities  for  general  state-space  models  of  the 
form  (I1.1D-(I1.2D:  see ICapoe  et  al.l  (2005l| .  [Poucet  et  ahl  |200lj. 

The  particle  approximation  of  Qg^n  is  the  empirical  measure  corresponding  to  a  set  of  IV  >  1 
random  samples  termed  particles,  that  is 


N 


(dxo-.n)  =  {dxo-.n) 


N 


(1.5) 


i=l 


where  Sz  {dz)  denotes  the  Dirac  delta  mass  located  at  z.  This  approximation  is  referred  to  as  the 
path  space  approximation  Del  Morlil  2004 1  and  it  is  denoted  by  the  superscript  ‘p’.  The  particle 


approximation  of  gg^n  is  obtained  from  by  marginalization 


Vg 


,n{dXn)  —  'y  ^ 


N 


{dXn) ■ 
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These  particles  are  propagated  in  time  using  importance  sampling  and  resampling  steps;  seelDoucet  et  al 
2001 1  and  Cappe  et  al.  2005l|  for  a  review  of  the  literature.  Specifically,  is  the  empirical  mea¬ 


sure  constructed  from  N  independent  samples  from 


(^^0:n)  ^^n+1  f 9  (  ^n+l  \  ^  99  {  Vn  \ 


idxo:. 


)9e{yn\Xn) 


(1.6) 


It  is  a  well  known  fact  that  the  particle  approximation  of  Qff.n  becomes  progressively  impoverished 
as  n  increases  because  of  the  successive  resampling  steps  Del  Moral  and  Douced.  l200,lL  lOlsson  et  al. , 
l2008l|.  That  is,  the  number  of  distinct  particles  representing  the  marginal  {dxQ-k)  for  any  fixed 
k  <  n  diminishes  as  n  increases  until  it  collapses  to  a  single  particle  -  this  is  known  as  the  particle 
path  degeneracy  problem. 

The  focus  of  this  paper  is  on  the  convergence  properties  of  particle  methods  which  have  been  re¬ 
cently  proposed  to  approximate  the  derivative  of  the  measures  {r]9^n{dxn)}n>o  w.r.t.  6  =  [0i, . . .  9d\^  G 


Ce,n  =  Vrye.n  = 


dp9,n 

ddi 


digger, 

dOd 


(See  Section [2] for  a  definition.)  References  Ceron  et"^  2001 1  and  Doncet  and  Tadic  2003l|  present 


particle  methods  which  have  a  computational  complexity  that  scales  linearly  with  the  number  N 
of  particles.  It  was  shown  in  Poviadiis  et  al.  2011 1  (see  also  IPoviadiis  et  ^  2009l|  for  a  more  de¬ 
tailed  numerical  study)  that  the  performance  of  these  0{N)  methods,  which  inherently  rely  on  the 
particle  approximations  of  {Q6i,n}n>o  constructed  as  in  (11.61)  above,  degraded  over  time  and  it  was 
conjectured  that  this  may  be  attributed  to  the  particle  path  degeneracy  problem.  In  contrast,  the 
alternative  method  of  Poyiadiis  et  al.  I  |200,^  was  shown  in  numerical  examples  to  be  stable.  The 
method  of  IPoviadiis  et  al.l  2005j  is  a  non-standard  particle  implementation  that  avoids  the  parti¬ 
cle  path  degeneracy  problem  at  the  expense  of  a  computational  complexity  per  time  step  which  is 
quadratic  in  the  number  of  particles,  i.e.  0(iV^);  see  Section  [2]  for  more  details.  Supported  by 
numerical  examples,  it  was  conjectured  in  Poviadiis  et  al.l  2011  that  even  under  strong  mixing  as¬ 
sumptions,  the  variance  of  the  estimate  of  the  filter  derivative  computed  with  the  0{N)  methods 
increases  at  least  linearly  in  time  while  that  of  the  0{N‘^)  is  uniformly  bounded  w.r.t.  the  time  index. 
This  conjecture  is  confirmed  in  this  paper.  Specifically,  we  analyze  the  0{N‘^)  implementation  of 
Poviadiis  et  al.l  2005l|  in  Section  [3]  and  obtain  results  on  the  errors  of  the  approximation,  in  partic¬ 
ular,  Lp  bounds  and  a  Central  Limit  Theorem  (CLT)  are  presented.  We  show  that  these  Lp  bounds 
and  asymptotic  variances  appearing  in  the  CLT  are  uniformly  bounded  w.r.t.  the  time  index  when 
the  state-space  model  satisfies  certain  mixing  assumptions.  In  contrast,  the  asymptotic  variance  of 
the  0{N)  implementations,  which  is  also  captured  through  the  CLT,  is  shown  to  increase  linearly. 
To  the  best  of  our  knowledge,  these  are  the  first  results  of  this  kind. 

An  important  application  of  our  results,  which  is  discussed  in  detail  in  Section  |3J  is  to  the 
problem  of  estimating  the  parameters  of  the  model  (inil-dOl)  from  observed  data.  The  estimates 
of  the  model  parameters  are  found  by  maximizing  the  likelihood  function  pg{yo-.n)  with  respect  to  6 
using  a  gradient  ascent  algorithm  which  relies  on  the  particle  approximation  of  the  filter  derivative. 
The  results  we  present  in  Section  [3]  have  bearing  on  the  performance  of  the  parameter  estimation 
algorithm,  which  we  illustrate  with  numerical  examples  in  Section  S)  The  Appendix  contains  the 
proofs  of  the  main  results  as  well  as  that  of  some  supporting  auxiliary  results.  As  a  final  remark, 
although  the  algorithms  and  theoretical  results  are  presented  for  a  state-space  model,  they  may  be 
reinterpreted  for  Feynman-Kac  models  as  well. 
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1.1  Notation  and  definitions 


We  give  some  basic  definitions  from  probability  and  operator  semigroup  theory.  For  a  measurable 
space  {E,  E)  let  M{E)  denote  the  set  of  all  finite  signed  measures  and  V{E)  the  set  of  all  probability 
measures  on  E.  The  n-fold  product  space  Ex  -  ■■  x  E  is  denoted  by  i?".  Let  B{E)  denote  the  Banach 
space  of  all  bounded  real-valued  and  measurable  functions  ip  :  E  ^  ^  equipped  with  the  uniform 
norm  ||(/j||  =  sup3,g^|(^(a;)|.  For  v  G  A4(E)  and  ip  G  B{E),  let  v{(p)  =  f  i^(dx)  p(x)  be  the  Lebesgue 
integral  of  p  w.r.t.  ly.  If  z/  is  a  density  w.r.t.  some  dominating  measure  dx  on  E  then,  v{p)  =  f  dx 
iy(x)  p(x).  We  recall  that  a  bounded  integral  kernel  M{x,dx')  from  a  measurable  space  {E,£)  into 
an  auxiliary  measurable  space  {E' ,  £')  is  an  operator  p  M{p)  from  B{E')  into  B{E)  such  that  the 
functions 

x^M{p)(x)\=  /  M{x,dx')p{x') 

Je' 

are  f-measurable  and  bounded  for  any  p  G  B{E').  The  kernel  M  also  generates  a  dual  operator 
v  I'M  from  M.{E)  into  A4(E')  defined  by 

{yM){p)  :=  v{M{p)). 


Given  a  pair  of  bounded  integral  operators  {Mi,  M2),  we  let  (Mi M2)  the  composition  operator 
defined  by  {MiM2){p)  =  Mi  (M2  (</?))  ■ 

A  Markov  kernel  is  a  positive  and  bounded  integral  operator  M  such  that  M(l)  (x)  =  1  for  any 
X  G  E.  For  p  G  B{E),  let 

0Sc((/7)  =  sup  \p{x)  —  p{x')\ 

x,x'GE 

and  let 

Osci(A)  =  {p  G  B{E)  :  osc((/j)  <  1}. 


Let  0(M)  G  [0, 1]  denote  the  Dobrushin  coefficient  of  the  Markov  kernel  M  which  is  defined  by  the 
formula  [Del  Mor^  2004.  Prop.  4.2.1]: 


/3(M)  :=  sup  {osc(M(:^))  ;  p  G  Osci(i5')}. 


If  there  exists  a  positive  constant  p  such  that  the  Markov  kernel  M  satisfies 

M  {x,  dz)  >  pM  {x  ,  dz)  for  all  x,x  G  E  then  (3  (M)  <  1  —  p. 


For  two  Markov  kernels  Mi,  M2,  (3{MiM2)  <  P{Mi)j3{M2). 

Given  a  positive  function  G  on  E,  let  'I'g  :  e  G  'P{E)  >->•  ^g{v)  G  P(£’)  be  the  probability 
distribution  defined  by 


4'G(i^)(dx) 


v{dx)G{x) 

KG) 


provided  00  >  v{G)  >  0.  The  definitions  above  also  apply  if  z/  is  a  density  and  M  is  a  transition  den¬ 
sity.  In  this  case  all  instances  of  v{dx)  should  be  replaced  with  dxv{x)  and  M{x,  dx')  by  dx'M{x,  x') 
where  dx  and  dx'  is  generic  notation  for  the  dominating  measures. 

It  is  convenient  to  introduce  the  following  transition  kernels: 


Qo  ,n{^n—l :  dXn)  —  39{yn  —  l\^n  —  l)dXnf0{Xn\Xn  —  i)  —  dXjiQ0{Xji\Xji—i)  ,  TIP  0, 

Q 0 ,k,n{9^k^  dXji)  —  {Q 6 ,k+lQ 6 ,k+2  ‘  ‘  '  Qd,n)  {^kj  dXji),  0  ^  /c  ^  TZ, 

with  the  convention  that  Qe,n,n  =  Id,  the  identity  operator.  Note  that  Qe,k,n{^)  {xk)  is  the  density 
of  the  law  of  Yk-.n-i  given  Xk  =  Xk-  For  0  <  p  <  n,  define  the  potential  function  Gg^p^n  on  X  to  be 

IIo,p,n{Xp)  —  Q9,p,n{I){Xp) /pQpQg  pn{l) .  (fG) 


4 


Let  the  mapping  <i>e,fc,ra  :  ViX)  — >•  'P{X),  0  <  fc  <  n,  be  defined  as  follows 


^e,k,n{i'){dxn) 


l^Qe,k,n{dXn) 
k'Qs ^k,n  (1) 


It  follows  that  r]0^n  =  ^0,k,n{'ne,k)-  For  conciseness,  we  also  write  $6i,n-i,n  as  ^e,n- 

A  key  quantity  that  facilitates  the  recursive  computation  of  the  derivative  of  r]e,n  is  the  following 
collection  of  backward  Markov  transition  kernels: 


Af|9,n  {,Xji ,  dXji—i  ) 


T]e^n-l{dXn-l)qe{Xn\Xn-l) 

rie,n-i{qe{xn\-)) 


n  >  0. 


Their  particle  approximations  are 


dXfi  —  i^ 


'n^^n-lidXn-l)q0iXn\Xn-l) 

V^,n-lid0{Xn\-)) 


(1.8) 


(1.9) 


These  backward  Markov  kernels  are  convenient  for  computing  certain  conditional  expectations  and 
probability  measures.  In  particular,  for  cp  G  B{X^),  we  have 


1^0  [  ^  ( A^n— 1 :  -An)  |  yO:n  —  l ;  :^n]  —  J  XI g  {Xn ,  dXn—l^P  {x^—i ,  Xyj)  , 

and  the  law  of  A^n-i  given  X„  =  Xn  and  Fo:n-i  =  Vo-.n-i  is  Mg^n{xn,dxn-i)  ■  ■  ■  Mg^i^xi,  dxo)- 
Finally,  the  following  two  definitions  are  needed  for  the  CLT  of  the  particle  approximation  of 
the  derivative  of  rjg  „.  The  bounded  integral  operator  Dg  k  n  from  X  into  is  defined  for  any 

G  S(A"+i)  by  ’ 


(Fn  )  (Xfc  )  . —  J'  (  XI g  j  {Xj  ,  dXj  _  i  )  1  Q0p+li^j  ^  dXjjf-i^\  Fn(j:o:n);  0  ^  k  ^  Tl^ 


(1.10) 


with  the  convention  that  Jl®  =  Fhe  particle  approximation,  is  defined  to  be 

9,k,u(.Fn){xk)  ■■=  J  iY[M^j{Xj,dXj-i)]  (Y[Q0,j  +  l{Xj,dXj+i)\  Fnixo,n).  (1.11) 


^0,, 


Kj=l^ 


To  be  concise  we  write 


q0,k{dXk^Dg  ^^^n{xk  :  dXQ:k  —  l:  dxi^-\-l:n)  aS  kj0  ^kB  g  ,n{dXQ-^Yi)  • 

(And  similarly  for  the  particle  versions.)  Although  convention  dictates  that  r]g^kDg^k,n  should  be 
understood  as  the  measure  {r]g^kDg^k,n){dxo:k-i,dxk+i-.n),  when  we  mean  otherwise  it  should  be 
clear  from  the  infinitesimal  neighborhood. 
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2  Computing  the  filter  derivative 

For  any  we  have 


VQe,„(F'n) 

1  /  «  n-1  \ 

=  — - T  dxo-.n'^  [Treixo)Y\_fe{xk\xk-i)  Y\.  iVkl  Xk)  ]  Fn{xo.,n) 

pe  (yo:n-i)  J  y  y 

- T-^ - -Eg  {F„(Xo:„)|  yo:n-i}  /  V  (  TTe  (xq )  TT /»  ( Xfe |  a:fc-i )  TT  (?/fc|  a:fe)  I 

pe(yo:n-i)  J  \  “  “  y 


fc=l  fc=0 

=  Eg  {  F'„(Xo:n)'7e,„(Vo:„)|  J/0:n-l}  ”  Efl  {-Fn(^0:n)|  J/0:n-l}]Ee  {  Tjl^n  ( Vo:„)  |  J/0:n-l} 


where 


^l9,n(^0:n)  —  ^  ^^^9  ^k{xk—l  t  Xk) 


k=0 


te,kixk-i,xk)  =Vlog(ge{yk-i\xk-i)  fe{xk\xk-i)) ,  fc  >  0, 
ie,o(a;-i,a;o)  =  te, 0(2^0)  =  VlogTr^  (xq)  . 


(2.1) 


(2.2) 

(2.3) 

(2.4) 


The  first  equality  in  (12.11)  follows  from  the  definition  of  and  interchanging  the  order  of  differ¬ 
entiation  and  integration.  The  interchange  is  permissible  under  certain  regularity  conditions  Pflud. 
Il996l|:  e.g.  a  sufficient  condition  would  be  the  main  assumption  in  Section  [3]  under  which  the  uni¬ 
form  stability  results  are  proved.  The  second  equality  follows  from  a  change  of  measure,  which 
then  permits  an  importance  sampling  based  estimator  for  the  derivative  of  Qe,n;  this  is  the  well 
known  score  method,  e.g.  see  Pflud  1996.  Section  4.2.1].  For  any  ipn  G  B{X),  it  follows  by  setting 
Pn(2^0:n)  —  Pni^Xji^  iu  (12. ID  that 

V  J  r]9,n{dXn)tpn{Xn) 

=  Eg  {'Pn{Xn)T0^n{Xo:n)\  2/0:n-l}  —  Eg  {</3„(V„)|  yo:ra-l}  Es  {  7e,ra(-^0:n)  I  2/0:n-l} 

=  J  (9, nidXn)<Pn{Xn) 


where 


Co,n  {dXn^  —  [^0,n  (-^0:n)  |  UOin—li  ^n]  ['^d,n  (^0:n)  |  2/0:n  — l])  • 


(2.5) 


We  call  Ce,n  the  derivative  of  rjg^n- 

Given  the  particle  approximation  (Ha  of  Qe,n,  it  is  straightforward  to  construct  a  particle  ap¬ 
proximation  of 


(4:i)  -  ^  l^To^n 


N 


r{i) 


{dXn) ■ 


(2.6) 


This  approximation  is  also  referred  to  as  the  path  space  method.  Such  approximations  were  implicitly 
proposed  in  Cerou  et  al.  2001 1  and  Doucet  and  Tadia  2003 1  and  there  are  several  reasons  why  this 
estimate  appears  attractive.  Firstly,  even  with  the  resampling  steps  in  the  construction  of  , 
Ce’n  computed  recursively.  Secondly,  there  is  no  need  to  store  the  entire  ancestry  of  each 

particle,  i.e.  ,  and  thus  the  memory  requirement  to  construct  Ce’jT  is  constant  over 
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time.  Thirdly,  the  computational  cost  per  time  is  0{N).  However,  as  suffers  from  the  particle 

path  degeneracy  problem,  we  expect  the  approximation  to  worsen  over  time.  This  was  indeed 
observed  in  numerical  examples  in  Poviadiis  et  al.l  2011 1  and  it  was  conjectured  that  the  asymptotic 


variance  (i.e.  as  TV  — >  oo)  of  Cf for  bounded  integrands  would  increase  linearly  with  n  even  under 
strong  mixing  assumptions.  This  is  now  proven  in  this  article. 


An  alternative  particle  method  to  approximate  {Ce,n}n>o  has  been  proposed  in  iPoviadiis  et  al 


2OO5L  2011 1 .  We  now  reinterpret  this  method  using  the  representation  in  (12.51)  and  a  different  particle 
approximation  of  that  avoids  the  path  degeneracy  problem. 


The  measure 


admits  the  following  backward  representation 


9,n{dxo.,n)  =  'rjB,n[dXn)  Mg ^k{xk,  dXk-l) 


k—n 


and  the  corresponding  particle  approximation  of  is  given  by 


le,u{dxo,n)  =  ?7^n(da;„)  M^ki^k,  dxk-i) 


k—n 


where  M^i,  was  defined  in  (11.911.  This  now  gives  rise  to  the  following  particle  approximation  of 
Poviadiis  et  al..  20051  2011|: 


C^n(V’n)  =  J  Qe,nidXo-.n)Te,niXo-.n)  {^n{Xn)  -  V0,ui^n)) 


and  indeed  =  /  Q^ji{dxo-,n)‘Pn{xn)-  It  is  apparent  that  constructed  using  this  backward 

method  avoids  the  degeneracy  in  paths.  It  is  even  possible  to  compute  recursively  as  detailed 
in  Algorithm  1;  since  a  recursion  for  rjg^n  is  already  available,  it  is  apparent  from  (12.511  that  what 
remains  is  to  specify  a  recursion  for  [Tg^n  (AToin)!  yQ:„_i,a;„].  Let  Tg^n{xn)  denote  this  term,  then 
for  n  >  1, 

6 ,n{Xn  )  —  \  yO:n—l ;  Xn] 

—  E^  [  Tg ^Yi—1  ( A^0:n—  1 )  |  T/0:n— 1 ;  Xn]  E^  [f  0,n  {dCji  —  i ,  Xn)  \  VOin—l ;  Xn] 

—  J‘  d/Ig  dXn  —  l')  (E^  ^Tg  ji—i  (Wo;n— 1)|  y0:n—2^  ^n— l]  T  (^n  — 1;  ^n)) 

—  ddg  YiiXn^  dXji  —  x)  {T g  ~\~  tg^n  (^n— ^n)) 

where  Tg^Q{xo)  =  te,o(a;o)-  Algorithm  1  computes  recursively  in  time  by  computing  (Tg^n,Vg,n) 

and  is  initialized  with  Tg^Q  =  tg^o{XQ^)  (see  (12.21)1  where  |Ag*^|  are  samples  from  TTg{xo). 

I  J  i<z<Ar 


Algorithm  1:  A  Particle  Method  to  Compute  the  Filter  Derivative 


•  Assume  at  time  n  —  1  that  approximate  samples 

of  are  available. 

I  J  l<i<N  I  V  J  )  1<2 


l<i<N 


from  rjg^n-i  and  approximations 


i<N 


At  time  n,  sa 


mple-jA^^H  independently 

1  J  l<i<N 


from  the  mixture 


T,f=i  fe  (xn\X^^li'^  gg  (^yn-i\ 


(2.7) 
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and  then  compute 


7p(*) 

^  S,n  — 


l<i<N 


and  as  follows: 


{r«} 

Ef=i  (r^Li  +  te,n  (x«i,x«))  fe  ge  (y„i| 


AT 


N 


N 


J_  V^TflO) 

N 


Ai) 


(dXn). 


(2.8) 

(2.9) 


f=i 


Algorithm  1  uses  the  bootstrap  particle  filter  of  Gordon  et  al.  1993l|  .  Note  that  any  SMC  imple- 
mentation  of  {ge,n}n>o  may  be  used,  e.g.  the  auxiliary  SMC  method  ofIPitt  and  SheohardI  Il999l|  or 
sequential  importance  resampling  with  a  tailored  proposal  distribution  [Poucet  et  al.Ll200ll|.  It  was 
conjectured  inIPoviadiis  et  ah  2011 1  that  the  asymptotic  variance  of  ^or  bounded  integrands 

ip  is  uniformly  bounded  w.r.t.  n  under  mixing  assumptions.  This  is  established  in  this  article. 


3  Stability  of  the  particle  estimates 


The  convergence  analysis  of  (and  Ce  ’iT  performance  comparison)  will  largely  focus  on  the 
convergence  analysis  of  the  X-particle  measures  (and  correspondingly  Qg’^)  towards  their 


limiting  values 


as  IV  — >■  oo,  which  is  in  turn  intimately  related  to  the  convergence  of  the  flow  of 


particle  measures  <  >  towards  their  limiting  measures  {?7e,n}jj>Q-  The  error  bounds  and  the 

I  ’  )  n>0  '  —  _ _ _ 


central  limit  theorem  presented  here  have  been  derived  using  the  techniques  developed  in  iDel  Moral 
2004l|  for  the  convergence  analysis  of  the  particle  occupation  measures 


One  of  the  central 


objects  in  this  analysis  is  the  local  sampling  errors  defined  as 


(3.1) 


The  fluctuation  and  the  deviations  of  these  centered  random  measures  can  be  estimated  using  non- 
asymptotic  Kintchine’s  type  L^-inequalities,  as  well  as  Hoeffding’s  or  Bernstein’s  type  exponential  de- 


Del  Mor^  2004 ,  Del  Moral  and  Ri3.  l2009l| .  In  Del  Moral  and  Miclol  2000|  it  is  proved  that 


viations 

these  random  perturbations  behave  asymptotically  as  Gaussian  random  perturbations;  see  Lemma 
17.101  in  the  Appendix  for  more  details.  In  the  proof  of  Theorem  17.111  (a  supporting  theorem)  in 
the  Appendix  we  provide  some  key  decompositions  expressing  the  deviation  of  the  particle  measures 
around  its  limiting  value  in  terms  of  the  local  sampling  errors  {Vg^Q, . . . ,  These  de¬ 

compositions  are  key  to  deriving  the  L^-mean  error  bounds  and  central  limit  theorems  for  the  filter 
derivative. 

The  following  regularity  conditions  are  assumed. 

(A)  The  dominating  measures  dx  on  X  and  dy  on  y  are  finite,  and  there  exist  constants  0  < 
p,S,c  <  oo  such  that  for  all  (x,  x' ,  y,9)  G  xy  x  Q,  the  derivatives  of  7rg(x),  fe  {x'\x)  and  gg  {y\x) 
with  respect  to  9  exists  and 


P  ^  <  f 9  {x'\x)  <  p,  S  ^  <ge  {y\x)  <  S, 
iVlogTTe  (x)|  V  |Vlog/e  (a;'|x)|  V  iVloggg  (?/|x)|  <  c. 


(3.2) 

(3.3) 


Admittedly,  these  conditions  are  restrictive  and  fail  to  hold  for  many  models  in  practice.  (Exceptions 
would  include  applications  with  a  compact  state-space.)  However,  they  are  typically  made  to  estab¬ 
lish  the  time  uniform  stability  of  particle  approximations  of  the  filter  Del  Mor^  2004  ICaope  et  al.l. 


2005l|  as  they  lead  to  simpler  and  more  transparent  proofs.  Also,  we  observe  that  the  behaviors  pre¬ 


dicted  by  the  Theorems  below  seem  to  hold  in  practice  even  in  cases  where  the  state-space  models 


do  not  satisfy  these  assumptions;  see  Sectional  Thus  the  results  in  this  paper  can  be  seen  to  provide 
a  qualitative  guide  to  the  behavior  of  the  particle  approximation  even  in  the  more  general  setting. 

For  each  parameter  vector  0  G  0,  realization  of  observations  y  =  {yn}n>o  and  particle  number 
N,  let  (fl,  J'jPg)  be  the  underlying  probability  space  of  the  random  process  . . . ,  xi^^)}n>o 

comprised  of  the  particle  system  only.  Let  Eg  the  corresponding  expectation  operator  computed 
with  respect  to  Pg.  The  first  of  the  two  main  results  in  this  section  is  a  time  uniform  non-asymptotic 
error  bound. 

Theorem  3.1  Assume  (A).  For  any  r  >  1,  there  exists  a  constant  Cr  such  that  for  all  9  G  0, 
y  =  {yn}n>o,  n>0,N>l,  and  G  Osci{X), 

-  CeAFn)[Y  <  Cr 

Let  {Ve,n}n>o  be  a  sequence  of  independent  centered  Gaussian  random  fields  defined  as  follows. 
For  any  sequence  {ipn}n>o  in  B{X)  and  any  p  >  0,  {V0^ni‘Pn)}n=o  is  a  collection  of  independent 
zero-mean  Gaussian  random  variables  with  variances  given  by 

Ve.niFl)  -Ve.niFnf-  (3-4) 

Theorem  3.2  Assume  (A).  There  exists  a  constant  C  <  oo  such  that  for  any  9  G  0,  y  =  {yn}n>o, 
n  >  0  and  tpn  G  Osci{X),  y/N  -  Ce.n)  (<Pn)  converges  in  law,  as  N  ^  oo,  to  the  centered 
Gaussian  random  variable 

-De,p,n(Te,n  —  Qe,n(Fe,n))^ 

^e.p.n(l)  )  ^  ^ 

whose  variance  is  uniformly  bounded  above  by  C  where 

Fe,n  =  (Tn  —  Qe,n(Tn))  {Te,n  —  Qe,niTe,n))  ■ 

The  proofs  of  both  these  results  are  in  the  Appendix. 

As  a  comparison,  we  quantify  the  variance  of  the  particle  estimate  of  the  filter  derivative  computed 
using  the  path-based  method  (see  (I2.6D.1  Gonsider  the  following  simplified  example  that  serves  to 
illustrate  the  point.  Let  gg  {y\x)  =  g  {y\x)  (that  is  0- independent),  fg  {xn  \  Xn-i)  =  Trg(x„),  where 
iTg  is  the  initial  distribution.  (Note  that  fg  in  this  case  satisfies  a  rephrased  version  of  (13.21)  under 
which  the  conclusion  of  Theoreml3.2lalso  holds.)  Also,  consider  the  sequence  of  repeated  observations 
2/0  =  2/1  =  •  •  •  where  yo  is  arbitrary.  Applying  Lemma  17.121  (in  the  Appendix)  that  characterizes  the 
limiting  distribution  of  —  Qg^n)  to  this  special  case  results  in  •\/]V(Ce’„  ~  Ce.n)(p)  (see 

(12.61)1  having  an  asymptotic  distribution  which  is  Gaussian  with  mean  zero  and  variance 

n  X  7r0(p^)7rg  [(VlogTr^)^]  + -Kg  [^^(V  log  TTg)^]  -  XTTg{Lpf 

where  Tp  =  ip—T:g[ip),  7rg(x)  =  7rg(x)g  (j/o|  x)  jirg  {g  (?/o|  ■))■  This  variance  increases  linearly  with  time 
in  contrast  to  the  time  bounded  variance  of  Theoreml3.2l 


p—Q 


Vg^p  (  Gg^ 


',p,n 


4  Application  to  recursive  parameter  estimation 


Being  able  to  compute  {C6i,n}n>o  is  particularly  useful  when  performing  online  static  parameter  esti- _ 

mation  for  state-space  models  using  Recursive  Maximum  Likelihood  (RML)  techniques  Le  Gland  and  Mevel 


19^  IPoviadiis  et  Zl.  1200.^  l2mi|:  see  also  iKantas  et  al.l  |2009l|  for  a  general  review  of  available 
particle  methods  based  solutions,  including  Bayesian  ones,  for  this  problem.  The  computed  filter 
derivative  may  also  be  useful  in  other  areas;  e.g.  see 
control. 


Goauelin  et  al.l  2008l|  for  an  application  in 
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4.1  Recursive  Maximum  Likelihood 


Let  6* *  be  the  true  static  parameter  generating  the  observed  data  {yn}n>o-  Given  a  finite  record  of 
observations  yo-.T,  the  log- likelihood  may  be  maximized  with  the  following  steepest  ascent  algorithm: 

=  0fc_i -h7fc  Vlogpe(2/o:T)le=e^_j ,  k>l,  (4.1) 

where  Oq  is  some  arbitrary  initial  guess  of  9*,  Vlogpe(2/o:T)le=e^_j  denotes  the  gradient  of  the 
log-likelihood  evaluated  at  the  current  parameter  estimate  and  {7fc}fc>i  is  a  decreasing  positive 
real-valued  step-size  sequence,  which  should  satisfy  the  following  constraints: 

OO  OO 

^7fe  =  oo,  ^7^<oo. 

k=l  fc=l 

Although  Vlogpe(yo:T)  can  be  computed  using  (I4.d|).  the  computation  cost  can  be  prohibitive  for 
a  long  data  record  since  each  iteration  of  (ED  would  require  a  complete  browse  through  the  T  +  1 
data  points.  A  more  attractive  alternative  would  be  a  recursive  procedure  in  which  the  data  is  run 
through  once  only  sequentially.  For  example,  consider  the  following  update  scheme: 


On  = 


7„  Vlogpe(y„|2/o:n-i)le=e„_i 


(4.2) 


where  V  logp6i(?/n|yo:n-i)le^g^_j  denotes  the  gradient  of  logpg{yn\yo:n-i)  evaluated  at  the  current 
parameter  estimate;  that  is  upon  receiving  9n-i  is  updated  in  the  direction  of  ascent  of  the 
conditional  density  of  this  new  observation.  Since  we  have 


Vlogpe{yn\yo-.n-i)\e=g^_,  = 


f  dXr 


96^ 


-i,n 


{Xn)  ^96  {yn\  Xn)\g  +  /  dXn  (  J/n  |  Xn)  CSn-l 


/  dXnrig„_un{Xn)ge„_i  iyn\Xn) 


(4.3) 

this  clearly  requires  the  filter  derivative  Ce.n-  The  algorithm  in  the  present  form  is  not  suitable 
for  online  implementation  as  it  requires  re-computing  the  filter  and  its  derivative  at  the  value  9  = 
9n-i  from  time  zero.  The  RML  procedure  uses  an  approximation  of  (ED  which  is  obtained  by 
updating  the  filter  and  its  derivative  using  the  parameter  value  0n-i  at  time  n;  we  refer  the  reader 
to  Le  Gland  and  Mevell  1997 1  for  details.  The  asymptotic  properties  of  the  RML  algorithm,  i.e. 


the  behavior  of  9r,.  in  the  limit  as  n  goes  to  infinity,  has  been  studied  in  the  case  of  an  i.i.d.  hidden 


process  bvlTitterington  19841  and  Le  Gland  and  Mevel  1997  for  a  finite  state-space  hidden  Markov 
model.  It  is  shown  in  Le  Gland  and  Mevell  1997|  that  under  regularity  conditions  this  algorithm 
converges  towards  a  local  maximum  of  the  average  log-likelihood  and  that  this  average  log-likelihood 


is  maximized  at  0* .  A  particle  version  of  the  RML  algorithm  ofiLe  Gland  and  Mevell 
Algorithm  I’s  estimate  of  rjg^n  is  presented  as  Algorithm  2. 


199711  that  uses 


Algorithm  2:  Particle  Recursive  Maximum  Likelihood 

•  At  time  n  —  1  we  are  given  j/om-i,  the  previous  estimate  0„_i  of  9*  and 

•  At  time  n,  upon  receiving  sample  <  independently  from  (ED  using  parameter 

t  J  l<i<N 

9  =  9n-i  to  obtain 


and  then  compute 


y(*) 


f  rpU) 

2^j=l  n-l 


Ef=i  (2/n-il4^i) 


CnidXn) 


i=l 


1 

TV 


T=i 


d (^dxji ) , 


and 

Vlogp(y„|yo:n-i) 
Finally  update  the  parameter: 


IVnidXn)  '^99{yn\Xn)\e^^,  +  I  Cn  {dXn)ger,_i  {Vnl  Xn) 

I  rjn  (dXn)ge,,.,  iVnlXn) 

On  =  On-l  +7nVlogp(l/„|2/0:n-l)  ■ 


(4.4) 


(4.5) 


(4.6) 


Under  Assumption  A,  the  particle  approximation  of  the  filter  is  stable  Del  Mor^.  2004 1;  see  also 
Lemma  17.41  in  the  Appendix.  This  combined  with  the  proven  stability  of  the  particle  approximation 
of  the  filter  derivative  implies  that  the  particle  estimate  of  the  derivative  of  logp  (?/„|  yo-.n-i)  is  also 
stable. 


4.2  Simulations 


The  RML  algorithm  is  applied  to  the  following  stochastic  volatility  model  Pitt  and  Sheohai^  1990l| : 


^0 

Yn 


^n+1  —  “F  rrVTi-j-1, 


/3exp(X„/2)  Wn, 


where  Af  {m,  s)  denotes  a  Gaussian  random  variable  with  mean  m  and  variance  s,  Vn 


AT  (0,1) 


and  Wn  ’  Af  (0, 1)  are  two  mutually  independent  sequences,  both  independent  of  the  initial  state 
Xo-  The  model  parameters,  0  =  (^,  cr, /3),  are  to  be  estimated. 

Our  first  example  demonstrates  the  theoretical  results  in  Section  [S]  The  estimate  of  d/da 
^ogp{yn,n+L-i\yo-.n-i)  at  0*  =  (0.8,  VOT,  1)  was  computed  using  Algorithm  1  with  500  parti¬ 
cles  and  using  the  path-space  method  (see  (12.61)1  with  2.5  x  10®  particles  for  the  stochastic  volatility 
model.  The  block  size  L  was  500.  Shown  in  Figure  [T]  is  the  variance  of  these  particle  estimates 
for  various  values  of  n  derived  from  many  independent  random  replications  of  the  simulation.  The 
linear  increase  of  the  variance  of  the  path-space  method  as  predicted  by  theory  is  evident  although 
Assumption  A  is  not  satisfied. 

For  the  path-space  method,  because  the  variance  of  the  estimate  of  the  filter  derivative  grows 
linearly  in  time,  the  eventual  high  variance  in  the  gradient  estimate  can  result  in  the  divergence  of  the 
parameter  estimates.  To  illustrate  this  point,  (gH)  was  implemented  with  the  path-space  estimate  of 
the  filter  derivative  (12.61)  computed  with  10000  particles  and  constant  step-size  sequence,  7„  =  10“"^ 
for  all  n.  Oq  was  initialized  at  the  true  parameter  value.  A  sequence  of  two  million  observations  was 
simulated  with  9*  =  (0.8,  v^OT,  1).  The  results  are  shown  in  Figure [S] 

For  the  same  value  of  9*  and  sequence  of  observations  used  in  the  previous  example.  Algorithm 
2  was  executed  with  500  particles  and  7„  =  0.01,  n  <  10®,  7^  =  (n  —  5  x  10^)“°  ®,  n  >  10®.  As  it 
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Figure  1:  Variance  of  the  particle  estimates  of  9/9crlogp  (?/n:n+500-i|  Vo-.n-i)  for  various  values  of  n 
for  the  stochastic  volatility  model.  Circles  are  variance  of  Algorithm  I’s  estimate  with  500  particles. 
Stars  indicate  the  variance  of  the  estimate  of  the  path-space  method  with  2.5  x  10^  particles.  Dotted 
line  is  best  fitting  straight  line  to  path-space  method’s  variance  to  indicate  trend. 


Figure  2:  Sequence  of  recursive  parameter  estimates,  =  {on,4>n,  Pn),  computed  using  dMl)  with 
N  =  500.  From  top  to  bottom:  /3„,  pn  and  cr„  and  marked  on  the  right  are  the  “converged  values” 
which  were  taken  to  be  the  empirical  average  of  the  last  1000  values. 
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Figure  3:  RML  for  stochastic  volatility  with  path-space  gradient  estimate  with  10,000  particles, 
constant  step-size  and  initialized  at  the  true  parameter  values  which  are  indicated  by  the  dashed 
lines.  From  top  to  bottom,  cj),  /3  and  a. 


can  be  seen  from  the  results  in  Figure  [5]  the  estimate  converges  to  a  value  in  the  neighborhood  of 
the  true  parameter. 


5  Conclusion 


We  have  presented  theoretical  results  establishing  the  uniform  stability  of  the  particle  approximation 
of  the  optimal  filter  derivative  proposed  in  Poviadiis  et  al.l  2005.  2009l|.  While  these  results  have 
been  presented  in  the  context  of  state-space  models,  they  can  also  be  applied  to  Feynman-Kac 
models  Del  Mor^.  2004  which  could  potentially  enlarge  the  range  of  applications.  For  example,  if 
dx'  fe  (a;'|  x)  is  reversible  w.r.t.  to  some  probability  measure  /ig  and  if  we  replace  gg  (?/„|  x„)  with 
a  time-homogeneous  potential  function  gg  (xn)  then  ijg^n  converges,  as  n  — >■  oo,  to  the  probability 
measure  g-g^h  defined  as 


ge^h{dx)  := 


gg{hg  f  dx'fe  {x'\ •)  hg{x')) 


gg{dx)  hg{x)  I  dx' fg{x'\x)hg{x') 


where  hg  is  a  positive  eigenmeasure  associated  with  the  top  eigenvalue  of  the  integral  operator 


Qg{x,dx')  =  gg{x)dx' fg  {x'\x)  (see  section  12.4  of  Del  Moral  2004l|i.  The  measure  gg^u  is  the 


invariant  measure  of  the  h-process  defined  as  the  Markov  chain  with  transition  kernel  Mg  {x,  dx')  oc 
dx' fg  (a;'|  x)  hg{x').  The  particle  algorithm  described  here  can  be  directly  used  to  approximate  the 
derivative  of  this  invariant  measure  w.r.t  to  9.  It  would  also  be  of  interest  to  weaken  Assumption  A 
and  there  are  several  ways  this  might  be  approached.  For  example  for  non-ergodic  signals  using  ideas 
in  Oudiane  and  Rubenthaler  _  20051.  Heine  and  CrisanI  2008l|  or  via  Foster-Lyapunov  conditions  as 


Beskos  et  al.  2011|.  WhitelevT  2011|. 
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7  Appendix 


The  statement  of  the  results  in  this  section  hold  for  any  0  and  any  sequence  of  observations  y  = 
{yn}n>o-  All  mathematical  expectations  are  taken  with  respect  to  the  law  of  the  particle  system  only 
for  the  specific  6  and  y  under  consideration.  While  9  is  retained  in  the  statement  of  the  results,  it  is 
omitted  in  the  proofs.  The  superscript  y  of  the  expectation  operator  is  also  omitted  in  the  proofs. 

This  section  commences  with  some  essential  definitions  in  addition  to  those  in  Section  [Ol  Let 


PB,k,n{^k:  dXn) 


Q6,k,n{Xk^  dXn) 

(1)  ) 


and 

1 

Me,p{xp,dxo:p-i)  =  Y\_Me,kixk,dxk-i), 

k—p 


and  its  corresponding  particle  approximation  is 


p  >  0, 


1 

k—p 


To  make  the  subsequent  expressions  more  terse,  let 

Ve,n  =  n  >  0,  (7.1) 

where  =  rjofi  =  ire  by  convention.  (Recall  «-!,«■)  Let 


=  cr  <  fc  <  n,l  <  i  <  iv|)  ,  n>0, 

be  the  natural  filtration  associated  with  the  Wparticle  approximation  model  and  let  be  the 
trivial  sigma  field. 

The  following  estimates  are  a  straightforward  consequence  of  Assumption  (A).  For  all  0  and  time 
indices  0  <  k  <  q  <  n, 


Q9,k,n{^){Xk)  ,  2  r2 


be,k,n  =  sup  n\f  f\ 

and  for  0  <  /c  <  q, 


Q6,k,q(,^k^ 

Q9,k,q{Q9,q,n(^^')^  (^fc) 


)  <  (i-P 


_4'^(g  k)  _  7^1?— 

r  5 

(7.2) 


Note  that  setting  q  =  n  in  (17.21)  yields  an  estimate  for  (3{Pg^k,n) 

Several  auxiliary  results  are  now  presented,  all  of  which  hinge  on  the  following  Kintchine  type 
moment  bound  proved  in  Del  Moral  2004  Lem.  7.3.3]. 


Lemma  7.1 


Del  Mot^  \200A.  Lemma  7. 3. 3] Let  fi  be  a  probability  measure  on  the  measurable  space 
{E,£).  Let  G  and  h  be  £ -measurable  functions  satisfying  G{x)  >  cG(x')  >  0forallx,x'  €  E  where  c 
is  some  finite  positive  constant.  Let  be  a  collection  of  independent  random  samples  from 

p.  If  h  has  finite  oscillation  then  for  any  integer  r  >  1  there  exists  a  finite  constant  Or,  independent 
of  N ,  G  and  h,  such  that 


/ae| 


Ef=iG(A«)h(AW)  p{Gh) 


EtiG(AW) 


p{G) 


<  c  ^osc(h)ar 
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Proof: 

The  result  for  G  =  1  and  c  =  1  is  proved  iniDel  Morall  2004].  The  case  stated  here  can  be  established 
using  the  representation 


G 

KG)  \  KG)  ) 


f,^{Gh)  f^iGh)  fijG)  ,  ^  . 

M^(G)  m(G)  /i^(G) 

where  fi^{dx)  = 

Remark  7.2  For  k  >  0,  let  h^_i  be  a  tF^i  measurable  function  satisfying  G  Osci(T’)  almost 
surely.  Then  Lemma\7. 1\  can  be  invoked  to  establish 


^eAv^.k-i)iGhK,) 


<  C 


<fc(G)  ^eAv^,k-AG) 

where  G  is  defined  as  in  Lemma  [m 

Lemma  O  to  Lemma  17.61  are  a  consequence  of  Lemma  17.11  and  the  estimates  in  (ESI)- 


Lemma  7.3  For  any  r  >  1  there  exist  a  finite  constant  such  that  the  following  inequality  holds 
for  all  9 ,  y ,  0  <  k  <  n  and  iFKi  measurable  function  satisfying  ip^  €  Osci(d:’) 
almost  surely, 

1 

ANE^  (l^e.kAdeAKn)  -  ^S,k-lAV0,k-l)An)\'’  )  "  <  Or  bg^k,n  P  (-Pfl.fc.n)  , 
where,  by  convention  ^e.-iAv^-i)  —  Vs,n,  and  the  constants  bg^k,n  and  P{Pg^k,n)  were  defined  in 


Proof: 


<^>kAA)iA:)-^k-iAKKi)iA:) 

f  f  r]Adxk)QkA^)Kk)  ^k{vKi)Kxk)QkA^)Kk)\  ^  ,  N\r.^  \ 

v^Qk.Ki)  MvKi)QkAi)  J^^KTn)Kk) 

where  ^oidA)  ~  Vo  by  convention.  Applying  Lemma |7. II with  the  estimates  in  (17.21)  we  have 

ANE(\^kAvAAn)-<^k-lAvKl)Au)\''  I  <ar  bpnP{Pk,n) 

almost  surely.  _ 


Lemma [731  may  be  used  to  derive  the  following  error  estimate  [Del  Moral l2004  Theorem  7.4.4]. 


Lemma  7.4  For  any  r  >  1,  there  exists  a  constant  Cr  such  that  the  following  inequality  holds  for 
all  9,  y,  n  >  0  and  p  €  Osci(A’), 

i  " 

ANEI  Yve,n  -  Ve,n]ip)Y  "  -  X!  K,k,n  P  {Pe,k,n)  ■  (7.4) 

fc=0 
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Assume  (A).  For  any  r  >  1,  there  exists  a  constant  cj,  such  that  for  all  0,  y,  n  >  0,  (p  €  Osci{X), 
G  G  B{X)  such  that  G  is  positive  and  satisfies  G(x)  >  cgG[x')  for  all  x,x'  €  X  for  some  positive 
constant  cq, 


'>le,nidXn)G{Xn) 

'  <niG) 


r]g^n{dXn)G{Xn) 

Ve,n{G) 


<  4(1  +  cg^). 


(7.5) 


Proof: 

The  first  part  follows  from  applying  Lemma  17.31  to  the  telescopic  sum  Del  Moral  12004  Theorem 
7.4.4]: 

n 

{Vu  -Vn)  (f)  -  ‘^k-l,n{Vk-l){F) 

k=0 


with  the  convention  that  =  Vn-  For  the  second  part,  use  the  same  telescopic  sum  but 

with  the  fc-th  term  being 


‘^k,niVk)iFG)  _  ‘^k-l,n{Vk-l)iFG) 

_  f  ( Vk  {dxk)Qk,n{G){xk)  _  ^k{rik-i)idxk)Qk,n{G){xk)\ 
VkQkAG)  <^kiVk-i)Qk,n{G)  ) 


Qk,niG(p^  i^Xk  ) 
Qk.n{G){xk) 


Apply  Lemma  [7 . 1 1  using  the  same  estimates  in  (17. 2L  i.e.  the  same  estimates  hold  with  G  replacing 
1  in  the  definition  of  bk,n  and  with  G  replacing  Qg,n(l)  in  the  argument  of  ft. 

The  following  result  is  a  consequence  of  Lemma  17.41 


Lemma  7.5  Assume  (A).  For  any  r  >  1,  there  exists  a  constant  Cr  such  that  the  following  inequality 
holds  for  all  9 ,  y ,  0  <  k  <  n,  N  >  0  and  ipn  G  Osci{X), 

VNE^g  (\[^e,k,niVd,k)  -  ^e,k,ni'n9,k)]  (T’n)!’')  "  < 


Proof: 

The  result  is  established  by  expressing  ^k,n{ilk) 


^k^nijik  ^{.dXn)  — 


9k  ^dXk)Q k,n(^^^(^Xk^ 

Vk  Qk,ni^) 


expressing  ^k,n(jik)  similarly,  setting  G  in  (17.51)  to  Qk,ni^),  F  = 

dl^. 


Pk,n{xk,dXn), 

Pk,n{Fn)  and  using  the  estimates  in 


Lemma  7.6  For  each  r  >  1,  there  exists  a  finite  constant  Cr  such  that  for  all  6 ,  y ,  Q  <  k  <  q  <  n, 
and  measurable  functions  Fq  satisfying  (pq  G  Osci{X)  almost  surely, 


VNE'^g 


<  Cr  b, 


^9,k,q{9e,k){dXq)Qe^q^ri(X){Xq)  ^9,k—l,qi'9e,k—l)idXq)Q9,q,ni^)iXq)  \  ^ 

^9,kAV^,k)Q9,g,na)  ^e,k-lM,k-l)Q9,gA^)  ) 

Q9,k,q(^^k-)  q 


e,kA 


/? 


Q 9 ^k ^qi^Q 9 k) 


1 
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Proof: 

This  results  is  established  by  noting  that 


^k,q{Vk)Qq,n{^)  ^k-l,q{'ll^-l)Qq,ni^) 

_  f  ( Vk  {dXk)Qk,n{i){Xk)  _  ^k{r]k-l){dXk)Qk,nii)iXk)\  Qk,q{Xk,  dXq)Qq^nii)iXq) 

VkQkA^)  <^k{vk-i)QkA^)  )  Qk,ui^)ixk) 

Now  Lemma  ITT]  is  applied  using  the  estimates  in  (17.211. 

Lemma  7.7  Assume  {AJ.  There  exists  a  collection  of  a  pair  of  finite  positive  constants,  ai,Ci,  i>  1, 
such  that  the  following  bounds  hold  for  all  r  >  1,  6,  y,  0  <  p  <  n,  N  >  1,  Xp  G  X ,  Fp  €  B{X^'^^), 
Fn  G  B{X^+^), 


-  d^e,piFpi.,Xp)){Xp)\''  <  ||Fp||a^p, 
{\D^^pjF^){xp)  -  Ds,p,niFn)ixp)\"  )  "  <  a,c„  ||F„|| . 

Proof: 

For  each  Xp,  let  xoip-i  — >■  Gp-i^xp{xo-p-i)  =  Fp{xo-p)q{Xp\xp-i).  Adopting  the  convention  Pq  =  rjo, 


Vp-kDp-k,p-iidxo-.p-i)q{xp\xp-i)  Vp_kD"_kp_j^idxo-.p-i)qixp\Xp-i) 


Mp  {Fp[.,  Xp))  (xp)  -  Mp  {Fp{.,  Xp))  (xp) 

P  r  /  riN 

=  E 

P 

=  E 

/c=l  ' 


Vp-kDp-k,p-i(<l{xp\.)) 


Vp-kDp_k,p-A<l{xp\.)) 


Fp{xQ-p) 


fc)Qp— A:)  A:)Qp— /c,p— l(^(^p|-))(^p— A:) 


^p_fcQp-/c,p-i  (q(2;pI-)) 

^p—k,p—l,Xp  (^p— fc) 

Qp—k,p—l  (^(^p  I  •)) {^p— ) 


^p_fcOp-fc,p-l  (^(^pl  ■)) 


where  G^_fc_p_i_,,^(xp_fc) 

— 

p—k^p 

—AGp—i,xp){xp 

norm 

sup 

^p—k 

fiN 

^p—k,p—l,Xp 

Qp— fc,p— 1  (^(^p 

,  {Xp-k) 


^p\')  JK-^p- 


<ll^pll- 


The  result  is  established  upon  applying  Lemma  17.11  (see  Remark  I7.2p  to  each  term  in  the  sum 
separately  and  using  the  estimates  in  dia).  To  establish  the  second  result,  let 


d^p,n  (^0:p)  —  J  Qp+1  {Xp ,  dXp-^1 )  ■  ■  ■  Qn{Xii—l ,  dXn)Fji  • 

Then, 

Dp,n{Fn){Xp)  -  Dp^n{Fn){Xp)  =  Mp  (Pp,„  (. ,  Xp))  (Xp)  -  Mp  (Fp,„(.,Xp))  (Xp). 

The  result  follows  by  setting  c„  =  psupg  IIQe.p.n)!)!!  and  it  follows  from  Assumption  (A)  that  c„  is 
finite. 

Lemma  EH  and  Lemma  ESI  both  build  on  the  previous  results  and  are  needed  for  the  proof  of 
Theorem  13.11 
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Lemma  7.8  Assume  (A).  For  any  r  >1  there  exists  a  constant  Cr  such  that  for  all  9,  y,  0  <k  <  n, 
N  >1,  Osci(X), 


VNE'^g  I  J  Qe^n{dxo:n)t9,k  {xk-l,Xk)  {ipn{Xn)  “  V^,niFn)) 

f  9B,kDe,k,ni.dX0-.n)  ^  ^  J  ,  ^  V^,kD0,k,ni‘fn)' 

'  - tB^k[Xk-l,Xk)  Fn{Xn) - 


J  '9^,k^^,k,n(^) 

<  2(n  -  k)CrTA-^ 

Proof: 

The  term  (iLhl)  can  be  further  expanded  as 

fV^D^Jdxo-.n).  J  .  ,  d^KniFu)' 


d^,k^^,k,ni^) 


(7.6) 


d^D^Jl) 


{dX0:n)tk  {xk-l,Xk)  {y:>n{Xn)  “  Vn  i.Fn)) 


n— 1  « 

p—k 

n—1 


- tk  [Xk-l,Xk)  tfn[Xn) - - 


d^D^Jl) 


p—k 
n—1 

=  E 

p—k 
n—1 

-E 

p—k 

n—1 

-E 

p—k 
n—1 

=  E 

p—k 


-E 

p—k 


fd^+,D^+Ud^^--rD.  ,  J  ,  ,  d^+lD^+,,ni^n) 


'«JdX0:„)  d^+,D^+,JdX0..n)\^,  J  ,  , 

-  -  '  tk{Xk-l,Xk)  \(Pn[Xn)  - 


«n(7^n) 


d^D^Fn)  d"+lD"+lAFn)\  d^+lD^+l,M  d^D^Jtk) 
_  d^+iD^+iAFn)\  d^D^,nitk) 
dp  D^,nidX0.n)  dp+lDp+i^n{dX0.n)\ 


X  I  tk  {xk-l,Xk)  - 


<^p^n(l)  J 

d^+lD^+UFn)' 


V’n{Xn)  - 


J 

'<+i7?p^+i.„(4)  V^D^^tk)' 


(7.7) 


(7.8) 


For  the  first  equality,  note  that  lyff  Dff  ^{dx^-n)  =  Q„  {dxo-n)-  It  is  straightforward  to  establish  that 


dp  Dp^n(.dxo-.n)/Vp  igiVpl-))  =  dp+lDp+i^nidX0:n), 


(7.9) 
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which  is  due  to 


'Hp  jdxp) 
Vp{9{yp\  ■)) 


Qj+i  )  dxj^i ) 


J^P 


_  Vp  idxp)g{yp\xp)f{xp+i\xp)  dxp+ir]^  ig{yp\  ■)f{xp+i  \  ■)) 
Vp  i9{yp\-)fixp+i\-))  Vp  (givpl-)) 

n—1 

=  Mp_^_i{xp+i,dxp)rip_^_i{dxp^i)  Qj^i{xj,dxj+i). 

j=P+i 


Qj+l{Xj:dXj+l) 

j=p+l 


Thus 


Vp  D^^nidxo:p+i,dxn)  ri"^T^D^_^_-^  „{dxo-p+i,dxn) 

_  7j^+iD^^j^  „_{dxo,p+i,dxn)  r]^+iD^+i^n{dxo,p+i,dxn) 

f  '9p+i{dxp+i)Qp+i^n{^){xp+i) 

\  Vp+lQp+l,ni^) 


(7.10) 


^^l(‘^^P+l)Qp+l,’^(l)(^P+l)  \  i\4N  /  1  s^Qp+l,n{Xp+i,dXn) 

In  the  first  line,  variables  Xp+2-.n-i  of  the  measures  rjpDp  .^{dxQ-n)  and  r]p+iDp_^ip^{dxo-n)  are  inte¬ 
grated  out  while  the  second  line  follows  from  (I7.9I1.  Using  (I7.10L  the  term  (|7.7I1  can  be  expressed 
as 


n—1 

E 

p—k  ' 


(  ^p+1  (^^p+l)Qp+l,n(l)(^p+l)  ^p+1  ('^^p+l)Qp+l,n(l)(^p+l  )  \ 

7:N  n  .  Tn  n  .  m 


lp+lQp+l,n{^) 


'np-\-iQp+i,n{^) 


VpDp^u{V>n)\  N  (.  Vp+lDp+i^ni^k) 


xPp+i.„U„  1  (^P+i)-^P+i  ~N 


Ip  p,ri 

Note  that  by  (13.31).  (17.21)  and  (17. 3L 


41) 


^p-i-i^p-i-i, 


(sp+i) 


-  p+l,n  1  V^n 


^^+i<+i,44) 

?y^+i<+i.n(l) 


(a^P-Hi) 


< 


(s^p+i) 


^  I  l5p-t-l,n  (^p-t-l ;  dXyi  ) 
\  Qp+l,n{^){Xp-^-i) 


<cp 


Thus  by  (17.21)  and  Lemma  EH  we  conclude  that  there  exists  a  hnite  constant  Cr  (depending  only 
on  r) 


n—1 

'^Vne- 

p—k 


Iy5n(a;ri)  - 


Vp  Dp, nidXo,n)  _  yp+iDp+i^r,idxo-.n) 

<l^p^n(l)  <+iP^+i.4l) 


<  (n  —  k)CrP' 


—n  —  k 


(7.11) 
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For  the  term  (I7.8L  it  follows  from  (17.101) 

f  Vp+l{dXp+i)Qp+i^n{^)iXp+l)  f  14.  \  ^p+1  (Qp+l."(^)-^p+l  (^fc)) 

“  J  <+i0p+i,4i)  1,  ■'+■  <+.Q,+.,4i)  j  ■ 


Thus,  using  dSSI)  and  (EH),  there  exists  some  non-random  constant  C  such  that  the  following  bound 
holds  almost  surely  for  all  integers  k  <  p  <  n,  N: 


< 


'7^P-fc  +  l 


Combine  this  bound  with  Lemma  O  to  conclude  that  there  exists  a  finite  (non-random)  constant 
Cr  (depending  only  on  r)  such  that  for  all  integers  k  <  p  <  n,  N: 


Vne 


V^+lD^+l,ui^n)\ 


<  CrP"-^ 

(7.12) 


The  result  now  follows  from  (17.1111  and  (17.121). 


Lemma  7.9  Assume  (A).  For  any  r  >  1  there  exists  a  constant  Cr  such  that  for  all  9,  y,  0  <k  <  n, 
N  >1,  (pn&  Osci(T’), 


\/NEI 


<  Cr-p 


i  —  k 


(7.13) 


Proof: 


V^Dj:jdxo..n)^  ,  J  ,  , 

=  J  Qn{dxo:n)tk  {Xk-l,Xk)  {pn{Xn)  -  PniFn)) 
f  f  Vk  Dk,nidX0:n) 


J  \ 

.  (  ,  ,  vilD^Jtk) 

To  study  the  errors,  term  (17.141)  may  be  decomposed  as 
f  (  9k  DUnidxo-.n) 


-  Qn{dxo,n)  tk  {xk-l,Xk)  {pn{Xn)  “  VniFn)) 


k  /*  /  ,^N  tiN 

=  E 


-  Qn{dxo..n)  tk  {Xk-l,Xk)  {pn{Xn)  “  VniFn)) 


Vp  Dp, nidXQ-n)  rj"  D^„^{dX0:n) 


P—0  ' 


V^D^Jl) 


tk  {xk-l,Xk)  {pn{Xn)  -  VniFn)) 


(7.14) 

(7.15) 
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with  the  convention  that  r]^  =  $o  (^-i)  =  Vo-  The  term  corresponding  to  p  =  k  can  be  expressed  as 


Vk  idXk)Qk,ri{X)^^k)  Vk  {dXk)Qk,n{^){xk)\  ^  \  r>  \\f^  ^ 

~N m  {^kt  dxk—l)tk  {Xk  —  1-,  Xk)  Pk,n[^n  Vn{}PTi))\Xk) 

VkQkA^)  VkQkA^)  } 

Using  Lemma  ITU]  and  Remark  17.21 


Vk^dXk)QkAAxk)  vAdXk)Qk,nA)(Xk)\  ^  o.  „  c.  ^ 

~Nn  \tk)  [Xk)  Pk,n{^n  VA^PnjjiXk) 


VkQk,nil) 


VkQk,nA  J 


Vne 

<  CrA~’" 

Similarly,  the  pth  term  when  p  <  k  can  be  expressed  as 

[  ( Vp  DA{dxo,n)  DA{dxo-.n) 


V^D^Jl)  V^D^Ji) 


tk  {xk-l,Xk)  {pn{Xn)  -  VnAn)) 


^Pjk—li'Hp  )  1  )Qfc— l,n  (1)  (^fc— 1 )  ^p,k—l(jjp  )  1  )Qfc— l,n  (1)  (^fc— 1 ) 


^  ‘hp,fe-l(??^)Qfc-l,n(l) 

Qk{xk  —  1 ;  dXk)Qk,n{_^^  i^Xk) 

Qk—l^n{^'){Xk—l^ 


‘hp,fe-l(??^)Qfc-l,n(l) 
tk  iXk-l,Xk)  Pk,n  An  -  VnAn))  {xk) 


Using  Lemma  17^  for  the  outer  integral  (recall  ^p^k-i{Vp)  =  ^p-i,k-i{Vp-i)) 


\/NE 


'Vp  Dp,nA^0-.n)  (^^l)  D^,nA^0:n) 


I  \  V^D^A^)  %{v^.,)D^A^) 

^  ry  — n—k—k  —  l  —  p 

<  CrP  P 


tk  {xk-l,Xk)  An{Xn)  -  VnAn)) 


Combining  both  cases  for  p  yields 


//ve| 


Vk  ddkA^^O-.n) 


-1 


-  Qn{dxo-A  tk  {xk-l,Xk)  An{Xn)  “  VnAn)) 


k-1 


—n—k 


(7.16) 


p=0 


<  CrA~'"  (  1  + 


1-p 


For  (|7.15L  Lemma  17751  yields  the  following  estimate 

V^DlnAnA  V^DlAk) 


Ane- 


VnAn)  - 


V^D^A^)  /  77f<„(l) 


<  CrA 


(7.17) 


The  proof  is  completed  by  summing  the  bounds  in  (I7.16L  (17.171)  and  inflating  constant  Cr  appro¬ 
priately. 
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7.1  Proof  of  Theorem  13.11 


n  p 

Cni^n)  -  Cni^Pn)  =  ^  Qn  {dXQ,n)tk  {xk-l,Xk)  {iPn{Xn)  “  Vn  (P’n)) 
k^O 


/' 


i{dxo.,„)tk  ixk-l,Xk)  {(finiXn)  “  VniP’n))  ■ 


To  prove  the  theorem,  it  will  be  shown  that  the  error  due  to  the  k-th.  term  in  this  expression  is 


^/ne- 


{dX0:n)tk  {Xk-l,Xk)  {p>n{Xn)  “  rjn  i.'Pn)) 


i{dX0:n)tk  {xk-l,Xk)  {(finiXu)  “  VniP’-n)) 


<  {n  —  k  +  l)CrP 


—n—k 


where  constant  Cr  depends  only  on  r  and  the  bounds  in  Assumption  (A)  (through  the  estimates  p 
and  p^5^  in  (EH)  as  well  as  the  bounds  on  the  score). 

{dXQ-n'^tk  (^Xk—1 ,  Xk')  (^n  iXn)  bn  (^n))  On(dxo:n  )t/i;  (^fc— 1,  Xk')  (^n  (^n)  bn  ip’n)) 

In  (dxo-.n)tk  (xk-l,Xk)  (pn(Xn)  “  b^ (<Pn)) 


V^D^Jdxo-.n)^  ,  J  ,  , 

tk  \Xk—l,Xk)  [  Pn(Xn)  _N  riN 


+ 


I  «n(l) 

Vk  Dk,nidX0:n) 


^k{xk  —  l^Xk^  I  p’niXn) 


'Qn{dxo,n)tk  {xk-l,Xk)  (pn(Xn)  “  Pn{<Pn)) 


(7.18) 


(7.19) 


The  proof  is  completed  by  summing  the  bounds  in  Lemma  17.81  for  (17.181)  and  Lemma  17.91  for  (17.191) 
and  inflating  constant  Cr  appropriately. 

7.2  Proof  of  Theorem  13.21 

The  following  result  which  characterizes  the  asymptotic  behavior  of  the  local  sampling  errors  defined 
in  dSU  is  proved  inlPel  Morall  |2004  Theorem  9.3.1] 

Lemma  7.10  Let  {</Jn}n>o  C  B{X).  For  any  9,  y,  n  >  0,  the  random  vector  {V^q{(Pq),  . . . ,  V^^{}pn)) 
converges  in  law,  as  N  ^  oo,  to  (14,0(7^0))  ■  •  ■ )  f4,n(v^n))  where  Vg^i  is  defined  in 

The  following  multivariate  fluctuation  theorem  first  proved  under  slightly  different  assumptions 


in  Del  Moral  et  al.  I  |2mn^  is  needed.  See  alsolPouc  et  al.l  |2009l|  for  a  related  study. 

Theorem  7.11  Assume  (A).  For  any  9,  y,  n>  0,  Fn  G 
in  law,  as  N  ^  00,  to  the  centered  Gaussian  random  variable 

Dg^p^n{Fn  —  Qg,n{Fn)) 


{Fn)  converges 


p—0 


-De,p,n(l) 


where  Vg^p  is  defined  in 
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Proof: 

Let 

n— 1 

7n  =  n  "nkigivkl  ■)) 

k^O 

and  define  the  unnormalized  measure 

Tn  —  TnQn- 

The  corresponding  particle  approximation  is  T^  =  7^ where  7^  =  Uk^lvk  iaivkl  ■))■  The  result 
is  proven  by  studying  the  limit  of  '/N  (T^  —  r„)  since 

[Qir  -  Qn](F„)  =  T  [r^  _  r„]  (f„  -  q„(f„))  . 

Tn 

Note  that  Lemma  FTdl  implies  7^  converges  almost  surely  to  7„.  The  key  to  studying  the  limit  of 
Vn  (r^  -  r„)  is  the  decomposition 

n 

Vn  [r^  -  r„]  (F„)  =  ^  7p^  F/  (DpAPn))  +  (K) 

p— 0 

where  the  remainder  term  is 


Rn  (Fn)  ■■=  ^p""  i^p'^n)  ^nd  the  function  7^  :=  -  7,„](F„) 

p— 0 


By  Slutsky’s  lemma  and  by  the  continuous  mapping  theorem  (see  van  der  Vaart  1998||i  it  suffices  to 
show  that  R^ {Fn)  converges  to  0,  in  probability,  as  — >■  00.  To  prove  this,  it  will  be  established 
that  E  (i?^(F„)2)  is  0{N-^).  Since 


p— 0 

and  [7^1  <  Cp  almost  surely,  where  Cp  is  some  non-random  constant  which  can  be  derived  using  (A), 
it  suffices  to  prove  that  E  {F^n)^'^  is  0{N~^).  By  expanding  the  square  one  arrives  at 

E  {F^,nf  I  Fp-l)  <  'i’p  {{F^,nf)  . 

By  Assumption  (A),  for  any  Xp_i  G  A, 

^P  {Vp-i)  {{Fp,nf)  J  Fp^^{Xpf. 


By  Lemma rm  E  (p/  {F^,nf)  is  0{N-^). 

The  next  lemma  is  needed  to  quantify  the  variance  of  the  particle  estimate  of  the  filter  gradient 
computed  using  the  path-based  method.  Note  that  this  lemma  does  not  require  the  hidden  chain  to 
be  mixing.  We  refer  the  reader  to  IPel  Moral  and  Miclol  2001  for  a  propagation  of  chaos  analysis. 

For  any  9,  y  =  {yn}n>0j  let  {Ve^n}n>o  be  a  sequence  of  independent  centered  Gaussian  ran¬ 
dom  fields  defined  as  follows.  For  any  sequence  of  functions  {Fn  G  S(A"+^)}„>o  and  any  p  >  0, 
{Tg,n(Tn)}7o  is  ^  collection  of  independent  zero-mean  Gaussian  random  variables  with  variances 
given  by 


Eg{Fn{Xo.,n)  |P0:n-l)  —  Eg  (F„  (Ao:„)  |yo:n- 1 


(7.20) 
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Lemma  7.12  Let  {(^ejeee  C  [1,  oo)  and  assume  <  g0{y\x)  <  6e  for  all  {x,  y,9)  G  AtxyxO.  For 
any  0,  y,  n>  0,  S  (p^  {dxo:n\yo:n-i)  -  Qe.n)  {Fn)  converges  in  law,  as  N  ^  oo,  to 

the  centered  Gaussian  random  variable 

n 

^  ^  ^6,p  (G^0,p,n  Fo,p,n^  • 

p—0 

where  Ge^p^n  defined  in  ra  and 

Fff^p^n  —  ^oi^F (^Xq-ji^\xq:p,  yp-\-i,ji—i^ 


7.2.1  Proof  of  Theorem  13.21 

It  follows  from  Algorithm  1  that 

iC-Q  (Fn) 

=  (pnTn)  -  QniFnTn)  +  Qn(Fn)Qn(Tn)  -  (fJQu  (^n)  (7.21) 

The  second  term  on  the  right  hand  side  of  the  equality  can  be  expressed  as 

Qn(<Pn)Qn(Tn)  -  (pn)Qn  (T„) 

=  Qn(FnQn(Fn)  +  Qn(Fn)Fn)  -  (PnQn  (F„)  +  Q„(ipn)Fn) 

+  (Q^ (<p„)  -  Qn(<Pn))  (Qn(n)  -  (T„)) .  (7.22) 

Combining  the  two  expressions  in  (17.211)  and  (I7.22|)  gives 

(Cn  -  (n)  (Fn) 

=  Qn  ((Fn  -  Qn(Fn))  (Tn  "  Qn(Tn))) 

-  Qn  ((Fn  -  Q.n(Fn))  (Tn  “  Qn('Ln))) 

+  (Qn  (Fn)  -  Qn(Fn))  (Qn(Tn)  -  Q^  (Tn)) 

Using  Lemma IT^ with  r  =  2  and  Chebyshev’s  inequality,  we  see  that  (Q^ (Fn)  —  Qn(Fn))  converges 
in  probability  to  0.  Theorem  1 7 . 1 1 1  can  now  be  invoked  with  Slutsky’s  theorem  to  arrive  at  the  stated 
result  in  (13.51). 

Moving  on  to  the  uniform  bound  on  the  variance,  let 

n 

Tn  -  Qn(T„)  =  y^Jk, 

tk  =  tk  -  Qn(tk), 

Fn  —  Fn  ^n(Fn). 


Also,  the  argument  of  Vp  can  be  expressed  as 

i  /  'i  Qp,n(^)(Xp)  Dp^n  (yFntk  ~  Qn  (^Fntk))  (Xp) 

"  VpQpAr  t',  Dp,n(l)(Xp)  ■ 

It  is  straightforward  to  see  that  rip((j>p)  =  0.  Therefore  the  variance  (see  (13.41)1  now  simplihes  to 


var 

p— 0 


G 


p,n 


Dp,n(Fn  -  Qn(Fn)) 
Dp,n(l) 


=  E^p('^p)- 

p—0 


(7.23) 
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Consider  the  function  (j)p.  For  p  <k  — 


Dp,n  Qn  (^p)  f  Vp{^^p)Qp',n{^){^p) 


-^p,n(l)(3^p) 


^  f  Qfc  (^fc  — 1 ;  dXfc  )(5fc^7T,  (1)  (Xfc  ) 

J  Qfc  — l,n  (1)  1 ) 

Using  the  estimates  in  f|3.3p  and  f|7.2p.  this  function  is  bounded  by 

k^p,n  *Qn  (^n^fc))  (^p) 


PpQp,n  (1) 

Qp,k—1  (^p ;  )(5fc_i^7T,  (1)  (Xfc_i ) 

Qp,n  (1)  (^p) 

Qp,k  —  1  (^p 5  l,n(l)(^A;  — l) 

Qp,n(l)(^p) 

tk  i^k—l  ^  ^k^Pk,n  i,Pn')  (^A; )  ■ 


sup 


-^p,n(l){^p) 


(7.24) 


for  some  constant  C.  When  p> 

Dp^n  i^Pnik  Qn  (^p) 


-^p,n(l)(^p) 
77p((iXp)(5p,n(l)  (^p) 


J  ^pQp,n(l) 

Again  using  the  estimates  in  (13.31) ,  f|7.2p  and  f|7.3p , 

^p,n  i^Pn^k  Qn  (v^n^fc))  (^p) 


(Adp(tA:)  (^p)-fp,n  {_Pn  )  (^p)  A^p  (^A:)  {,^p^Pp,n(^Pn  )  (^p))  • 


sup 


^p,n(l)(^p) 


<  Cp 


'—n—k 


(7.25) 


Combining  (|7.24p  and  (|7.25p. 


f-n-p 


sup  |(/>p(xp)|  <  - - —PCp'^  ^  (n-p) 

t  P 


0  <  p  <  n.  Combining  this  bound  with  (I7.2dl)  will  establish  the  result. 
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